IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards by arthurpaulino · Pull Request #438 · argumentcomputer/ix

arthurpaulino · 2026-06-10T11:31:52Z

Summary

Thirteen commits in the Aiur IxVM kernel: ten optimization areas — each
targeting a specific hotspot found by profiling per-function FFT cost on real
shards or single constants of a compiled Init environment — plus a CLI
ergonomics fix for resolving private Lean.Name arguments. No semantics change:
every shard still checks without error, and the full lake test -- --ignored ixvm suite passes (297 tests) at every step.

Cumulative effect on a reduction-heavy shard (Init shard 9, one owned constant,
deep reduction): 45.5G → 16.6G total FFT (−64%). The structural / keyed-lookup
areas (1–5) account for 45.5G → 37.8G; the Nat-reduction area (6) takes it the
rest of the way by collapsing two symbolic-arithmetic blowups from O(n) / O(n²)
to O(1). That same area makes a previously intractable proof class — the UTF-8
codec lemma ByteArray.utf8DecodeChar?_utf8EncodeChar_append, whose checking
spawns deep Nat.add x 0xC0 succ-towers and c >>> 6/12/18 division-towers —
reduce in constant rather than tower-depth cost. The deserialization (7),
memoization (8), substitution (9), and primitive-gauntlet (10) areas then cut
ingress and reduction cost across all consts.

Those last areas make another previously intractable proof checkable:
Vector.extract_append._proof_1 (an omega-backed Vector.extract/++ lemma
whose check is ~50% beta-substitution) went from OOM (>14G RAM) to 56.6G FFT
(area 8, which lets it complete at all) to 38.3G (area 9, −32%) to 28.0G
(area 10, a further −27%) — −50% from the first completing run.

Method

Built a .ixe of the Init closure, profiled and sharded it
(ix profile → ix shard --shards 64), then ran ix check per shard or per
constant and read the per-function FFT table (--stats-out). Shard 10
(check-heavy: lookups, address compares, deserialize) and shard 9
(reduction-heavy: whnf, substitution, spine collection) exercise different
profiles; small hand-built or hand-picked reproducers (x + D = …, x >>> D = …,
Int.emod_emod_of_dvd) isolate and scale individual blowups.

Aiur cost model behind the changes: FFT ≈ per-function width × row height,
charged at the function's max-arm width on every row. Levers: (a) factor a
wide/rare arm into its own circuit; (b) cut row count via better memoization;
(c) replace linear scans with keyed lookups; (d) keep symbolic primitives
compact so they never materialize a tower.

Areas

1. Cheaper Expr.Share / universe lookups — commit 3774860.
The sharing-table and universe-array lookups walked the list with
list_lookup_u64 (per-step U64 predecessor). Switched to a field index and, for
Expr.Share, to list_drop — which returns the sublist pointer, so repeated
lookups dedup in the memo cache. Shard 10: total FFT 3.89G → 2.11G; the merged
pointer-list walk fell from 1.93G to 5.8M.

2. Hot/cold split of address_eq — commit 69b6936.
The primitive-dispatch gauntlet compares each Const head against ~50 hardcoded
addresses, making address_eq (a full 32-limb compare, width 109) the hottest
circuit. Almost every comparison differs at limb 0. Split it: address_eq
compares limb 0 only and delegates the rest to address_eq_tail, whose height is
just the rare limb-0 matches. (A nested match in the same function does nothing —
Aiur charges full width per row regardless of arm; the split must be a function
boundary.) Shard 10: width 109 → 80, 509.8M → 374.1M; total 2.106G → 1.973G.

3. Hot/cold split of whnf_with_spine — commit 0e30d78.
On reduction-heavy consts this is the single hottest circuit (width 89 × 3.36M
rows), its width set by two arms a minority of rows reach: Const-head and
Proj-head dispatch. Factored both into whnf_const_head / whnf_proj_head. The
Proj arm is the jackpot — 3.4K rows but nearly half the width. Shard 9: width
89 → 34, 6.48G → 2.48G.

4. Non-tail, single-definition collect_spine — commit 5b7fabc.
collect_spine peeled the App chain with a tail-recursive accumulator. Tail
recursion buys nothing in Aiur (stack depth is free), and the accumulator landed
in the memo key, blocking reuse — 0 cache hits across 5.38M rows. Rewrote it
non-tail, keyed on the expression alone (2.19M hits now), and merged the two
verbatim copies into one definition in kernelTypes. Shard 9: 2.65G → 0.96G.

5. Keyed skip-set for sharded check — commit a52967f.
The largest address_eq source on a sharded check was addr_in_list, an O(N)
linear scan of the assumption-leaf list per checked const (1.81M calls on shard
9). Replaced with an RBTreeMap skip-set keyed on the first 4 address bytes:
build once, membership = one rbtree lookup + one confirming address_eq. Shard 9:
address_eq 2.07M rows / 3.48G → 274K rows / 0.40G; total 41.7G → 37.8G.

6. Compact symbolic Nat reduction — commits dbc4177, 5dcab7f, f7cfe23.
Nat primitives applied to (symbolic base, literal) delta-unfolded into towers
the kernel then re-walked: Nat.add x 0xC0 → succ^192(x), and Nat.div x 2
(what x >>> k unfolds to, k times) → an exploding division algorithm. Three
coordinated commits keep them compact:

dbc4177: try_nat_linear_rec reduces Nat.rec base (succ-step) (Lit n) with
a symbolic base directly to Nat.add base (Lit n) instead of n iota steps.
5dcab7f: whnf (try_nat_offset_stuck) leaves Nat.add base (Lit n) stuck as
a compact offset; def-eq (nat_offset_of + try_def_eq_nat) decomposes each
side to base + KLimbs and decides base ≟ base on equal offsets.
f7cfe23: generalizes the stuck rule to Nat.div / Nat.mod base (Lit k)
(k ≥ 2), so symbolic >>> reduces to nested-stuck-div in O(1).

Benchmarks: x + D = Nat.rec x succ-step D goes O(D) → depth-independent
(328M → 28.5M at D=512, −91%); x >>> 18 = (x >>> 9) >>> 9 goes 15.65G → 455M
(−97%), matching x >>> 6. All magnitudes stay KLimbs (no Goldilocks
collapse); only de Bruijn / top-position indices are G.

7. Narrow Ixon expr deserialization — commit fa3447e.
On large-closure consts, ingress deserialization dominates: the two hottest
parsers are get_expr and get_app_telescope, each carrying width set by a
single arm. Split get_expr's Let arm (three sub-get_expr calls + triple
store) into get_expr_let, dropping get_expr width 102 → 75. Changed
get_app_telescope's func: Expr parameter to func: &Expr (load inside),
dropping width 95 → 78. Ingress profile (2000-const UTF-8 closure, run_check
load-only): get_expr 766M → 563M, get_app_telescope 477M → 392M, total
ingress 3.13G → 2.85G.

8. lbr-trimmed context keys for whnf / infer / def-eq memoization — commit
d96c738.
Aiur auto-memoizes whnf / k_infer / k_is_def_eq on the full
(expr, types, …) argument tuple, so the same closed reduction gets a fresh
memo key at every binder depth and misses. Mirror Rust's
whnf_key / infer_key / def_eq_ctx_key (src/ix/kernel/tc.rs): trim types
to the suffix reachable from the expression's loose-bvar range (ctx_trim /
ctx_reachable, a fixpoint that closes over kept frames' own ranges) before the
memoized call. A closed subterm (lbr == 0) keys to the empty context and shares
its result across every depth. Eager substitution is unchanged (matches the
Lean / Rust reference); the win is purely cache keying. Fast paths
(closed → Nil, full-reach → as-is) keep the boundary cheap.
FFT (ix check <const> --stats-out): Nat.add_comm 58.28M → 57.90M (−0.66%);
Int.emod_emod_of_dvd 5.090G → 4.691G (−7.85%, inference / omega-heavy); Init
shard 10 4.738G → 4.721G (−0.37%). Net-positive everywhere measured.

9. Multi-arg beta with a narrow substitution walk — commit 3203691.
whnf_apply_beta substituted one spine argument per expr_inst1, re-walking
the (shrinking) body for each, so a K-argument application (λλλ. body) x y z
walked the body K times. Substitution (expr_inst1 + expr_inst1_walk) was
~53% of FFT on omega-heavy proofs. Mirror Rust's multi-arg beta
(src/ix/kernel/whnf.rs:541-567): peel_beta consumes the whole telescope and
expr_inst_many (= simul_subst) substitutes all consumed args in one walk.
Single-arg betas stay on the cheap expr_inst1 path, so short telescopes don't
pay the list overhead — the reason an earlier unconditional simul_subst
regressed and was reverted. A hot/cold split factors the multi-arg walk's BVar
arm (list_length / list_lookup / expr_lift) into expr_inst_many_bvar,
keeping the hot App/Lam walk narrow (width 82 → ~53, the same pattern as areas
2–3). FFT (ix check <const> --stats-out):
…Vector.extract_append._proof_1 56.64G → 38.31G (−32.4%); Nat.add_comm
57.90M → 56.17M; Init shard 10 4.721G → 4.668G (−1.1%); shard 9 16.346G →
16.314G.

10. Gate the primitive-op gauntlet behind a memoized classifier — commit
87cb8b0.
whnf_const_head ran five try_* dispatchers (nat / str / bitvec / native /
decidable) in sequence on every Const head, each comparing the head
address against its family's primitive addresses — wasted on the common
non-primitive head. Worse, several try_* speculatively WHNF their arguments
(to test for literal / decidable operands), so a non-primitive head still
triggered deep argument reductions that were then thrown away. Factor the chain
into try_address_primitives and gate it behind prim_any_addr: a sum of the 43
head-dispatched address_eq checks, memoized per head address (computed once
per const, not per whnf). A 0 means no try_* can match, so dispatch jumps
straight to projection-definition / delta. Validated by a differential assert
(prim_any_addr must be 1 whenever the gauntlet actually matched) run over the
suite + Int.emod_emod_of_dvd + Vector._proof_1; it caught two missing
addresses (string_utf8_byte_size, punit_size_of_1), then passed across
millions of reductions — assert removed afterward. FFT (ix check <const>):
…Vector.extract_append._proof_1 38.31G → 28.02G (−26.8%, the skipped
speculative arg-WHNF); Int.emod_emod_of_dvd 4.69G → 3.97G (−15%);
Nat.add_comm 56.17M → 56.06M; Init shard 9 16.314G → 16.279G; shard 10
4.668G → 4.659G.

Tooling

Resolve private Lean.Name args in check/prove — commit 6ec124a.
parseName (naive dot-split; an all-digit segment → Name.mkNum) can't rebuild
a private name like _private.Init.Data.Vector.Extract.0.…_proof_1 from its
display string, so ix check / ix prove reported it "not found". Added
toString fallbacks — resolveName (compiled-in env) and resolveIxeAddr
(.ixe env) — that match the arg against each constant's displayed name when the
direct lookup misses. Covers both commands (shared forEachClaim), across the
compiled-in and --ixe paths. Tooling only; no kernel change.

Soundness

Areas 1, 3, 4, 7 are structural / refactor rewrites with identical results
(each shard still checks).
Area 2: address_eq returns the same value as a full 32-limb compare — one
differing limb proves inequality.
Area 5: key collisions overwrite, sound because the only effect is a missed
skip (a frontier const gets rechecked, never wrongly trusted); the confirming
address_eq rules out false positives; the 4-byte key makes collisions
negligible regardless.
Area 6: keeping Nat.add / div / mod symbolic literal stuck is the correct
normal form (these are irreducible for a symbolic base). try_def_eq_nat
returns eq = 1 only when offsets are equal and bases are def-eq (+ k is
injective); otherwise it falls back to the generic path. BAD negative cases
still get rejected (no false accept), and recursor-on-symbolic-offset reduces
correctly.
Area 8: trimming types to the loose-bvar-reachable suffix changes only the
memo key, not the computation — a closed expr never consults the trimmed
frames, and ctx_reachable's fixpoint keeps every frame a kept frame's stored
type can reach. Eager substitution and all reduction logic are untouched.
Area 9: expr_inst_many is the standard simultaneous de Bruijn substitution
(simul_subst); peel_beta consumes exactly one argument per Lam, so
(λλλ. body) x y z and three sequential single substitutions produce the same
term. The hot/cold split only relocates the BVar arm — identical result.
Eager substitution semantics are unchanged (matches the Lean / Rust reference).
Area 10: gating is sound iff prim_any_addr is 1 whenever the gauntlet would
reduce — skipping it then only elides work that returns "no match". A
differential assert checked exactly this invariant on the suite plus the two
heavy consts (millions of reductions across every primitive family) before the
gate was trusted; the assert caught the only two omissions and then ran clean.
Whole suite: the full ixvm suite passes (297 tests) at every commit.

Expr.Share resolution and universe lookups walked the sharing/univ lists with `list_lookup_u64` (per-step `u64_is_zero` + `relaxed_u64_pred`). Switch to: - universe lookups: `list_lookup(univs, flatten_u64(idx))` — cheap per-step field subtraction instead of the U64 predecessor. - Expr.Share (Convert convert_expr + Ingress deref_share): `let ListNode.Cons(e, _) = load(list_drop(sharing, flatten_u64(idx)));`. list_drop returns the sublist *pointer*; repeated Share lookups collapse to shared memo entries, so the walk is near-free. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10` (Init, owned=3, closure=676): total FFT 3.89G -> 2.11G. The merged pointer-list walk drops from 1.93G (list_lookup) to 5.8M (list_drop, 0.27%). (Measurement run had in-circuit blake3 verification disabled — not committed.)

The primitive-dispatch gauntlet in whnf compares each Const head against ~50 known primitive addresses, so address_eq is the single hottest circuit. The full 32-limb compare ran at width 109 on every call. Split it: `address_eq` now compares limb 0 only and rejects (the common case — a single differing limb proves inequality), delegating the remaining 31-limb compare to a separate `address_eq_tail`. Because Aiur charges a function's full width on every one of its rows, the cold path must be a separate function (a nested match in address_eq changes nothing — measured identical); as its own circuit, address_eq_tail's height is only the rare limb-0-match rows. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10`: - address_eq: width 109 -> 80, FFT 509.8M -> 374.1M - address_eq_tail (new): width 108, height 1683, FFT 1.9M - total shard FFT: 2.106G -> 1.973G Limb-0 is the sweet spot: hot height (259991) dominates tail height (1683), so pulling a second limb forward (N=2) widens the hot circuit by more than it saves in the tail (measured 1.976G, worse). Result is identical to a full 32-limb compare (sound). (Measurement run had in-circuit blake3 verification disabled — not committed.)

whnf_with_spine is the single hottest circuit on reduction-heavy consts: width 89 charged on every one of its 3.36M rows. The width was set by two arms that only a minority of rows reach — the Const-head dispatch (delta/iota/quot/prim gauntlet) and the Proj-head reduction. Factor both into their own functions (`whnf_const_head`, `whnf_proj_head`). Because Aiur charges a function's full width on every row, the ~76% of steps that are App/Lam/Let now run at the narrow residual width instead of 89; the wide dispatch only taxes its own (smaller) row count. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9` (1 reduction-heavy owned const): - whnf_with_spine: width 89 -> 34, FFT 6.48G -> 2.48G - whnf_const_head (new): width 77, height 821490, FFT 1.24G - whnf_proj_head (new): width 56, height 3452, FFT 0.002G - total shard FFT: 45.54G -> 42.78G (-6.1%) The Proj arm is rare (3452 rows) but was nearly half the width; extracting it dropped whnf_with_spine 65 -> 34 for ~zero relocated cost. Pure refactor.

Replace the tail-recursive `collect_spine_go(e, acc)` (and its verbatim copy `collect_spine_simple_go`) with one non-tail `collect_spine(e)` in the kernelTypes block. Repoint all callers (whnf, primitive, inductive_check, def_eq); delete both old copies. Keyed on `e` alone, it now dedups shared sub-spines (0 -> 2.19M cache hits). Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`: - collect_spine: 5.38M rows / 2.65G -> 2.17M rows / 0.96G - list_snoc.Ptr: +0.96G (new) - total shard FFT: 42.78G -> 41.67G

check_all_skipping tested each closure const against the assumption-leaf list via addr_in_list, an O(N) linear address_eq scan — the single largest address_eq source on sharded checks (1.81M calls on Init shard 9, more than all primitive dispatch combined). Build an RBTreeMap skip-set once (keyed on the first 4 address bytes), then test membership with one rbtree lookup + one confirming address_eq. Key collisions overwrite: sound because a missed skip only rechecks a frontier const, and the confirming address_eq rules out false positives. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`: - address_eq: 2.07M rows / 3.48G -> 274K rows / 0.40G - rbtree build + lookups: +~0.46G - total shard FFT: 41.67G -> 37.84G

try_nat_linear_rec only folded `Nat.rec base (λ_ ih => succ ih) (Lit n)` when the base was also a literal; a symbolic base declined and fell through to n iota steps that materialize succ^n(base). Reduce the symbolic-base case directly to the offset primitive `Nat.add base (Lit n)` instead. This keeps the value in the same compact `base + n` form a literal already has, so def-eq comparing it converges instead of descending n unary succ layers (the UTF-8 `x + 0xC0` reduction class). Offset magnitude stays KLimbs; only the nat_add top-position index is a (small) G. Measured on a `x + D = Nat.rec x succ-step D` benchmark: the unary def-eq descent collapses (nat_succ_of D -> ~constant), ~-15% total FFT at D=512. The remaining cost is the in-whnf chain reduction (substitution), unchanged here.

Reducing `Nat.add x (Lit n)` (symbolic x) delta-unfolded into a succ^n(x) tower, and the nat_succ dispatch then whnf-cascaded the whole chain — O(n) substitution per reduction, the dominant cost on Nat-arithmetic proofs (the UTF-8 codec class). Two coordinated changes: - whnf (try_nat_offset_stuck): leave `Nat.add base (Lit n)` (non-literal base, nonzero n) stuck as a compact offset instead of unfolding it. - def-eq (nat_offset_of + try_def_eq_nat): decompose each side to base + KLimbs offset (reading `Nat.add base (Lit m)` in O(1), peeling the few succs whnf exposes) and decide `base_a ≟ base_b` only when offsets are equal; otherwise fall back. All magnitudes stay KLimbs (no Goldilocks collapse). Measured on `x + D = Nat.rec x succ-step D` (depths 64..512): total FFT goes from O(D) (328M at D=512) to a depth-independent ~28.5M (-91% at D=512). Full `lake test -- --ignored ixvm` passes (297 tests; BAD cases still rejected); a recursor-on-symbolic-offset completeness test reduces correctly.

`Nat.div x (Lit k)` / `Nat.mod x (Lit k)` with a symbolic base delta-unfolded the division algorithm and expanded hugely, even though the result is irreducible for a symbolic base. `Nat.shiftRight x k` unfolds to k nested `Nat.div _ 2`, so a symbolic `>>>` materialized a deep, super-linear tower — the dominant cost in the UTF-8 codec's `c >>> 6/12/18` encoding. Generalize `try_nat_offset_stuck` (which already keeps `Nat.add base (Lit n)` compact) to also leave `Nat.div`/`Nat.mod base (Lit k)` (k ≥ 2) stuck. Thresholds keep `x/1 = x` and `x/0 = 0` reducing. Both `x >>> 18` and `(x >>> 9) >>> 9` then reduce to the same nested-stuck-div form, so def-eq stays structural and the identity still holds. Measured on `x >>> 18 = (x >>> 9) >>> 9`: total FFT 15.65G -> 455M (-97%), now depth-independent (matches `x >>> 6`). No regression on the Nat.add reproducers; full `lake test -- --ignored ixvm` passes (297 tests).

Ingress deserialization (parsing every closure const's Ixon bytes into Expr trees) is the dominant cost when checking large-closure constants. Two width cuts on the hottest parsers: - get_expr: the Let arm (three recursive get_expr calls) set the function width though Let is rare. Factored into get_expr_let — width 102 -> 75. - get_app_telescope: passed `func` by value (the wide Expr union) on every recursion step. Pass it by `&Expr`, loaded only at the base case — width 95 -> 78. Measured by profiling ingress of the `ByteArray.utf8DecodeChar?…` closure (2000 consts): get_expr 766M -> 563M, get_app_telescope 477M -> 392M, total ingress FFT 3.13G -> 2.85G. Full `lake test -- --ignored ixvm` passes (297).

Key whnf / k_infer / k_is_def_eq memoization on the loose-bvar-reachable suffix of the local context (ctx_trim / ctx_reachable), mirroring Rust's whnf_key / infer_key / def_eq_ctx_key. A closed subterm (lbr 0) keys to the empty context and shares its result across every binder depth; eager substitution is unchanged. Fast paths (closed -> Nil, full-reach -> as-is) keep the boundary cheap. FFT (lake exe ix check <const> --stats-out): Nat.add_comm 58.28M -> 57.90M (-0.66%); Int.emod_emod_of_dvd 5.090G -> 4.691G (-7.85%); Init shard 10 4.738G -> 4.721G (-0.37%). 297 ixvm tests pass.

parseName can't rebuild a private name (_private.M.0.foo) from its display string -- the marker / scope-index components don't round-trip through naive dot-splitting. Add toString fallbacks: resolveName for the compiled-in env and resolveIxeAddr for a .ixe env, so check and prove (which share forEachClaim) resolve names like _private.Init.Data.Vector.Extract.0.Vector.extract_append._proof_1 across both the compiled-in and --ixe paths.

whnf_apply_beta substituted one spine arg per expr_inst1, re-walking the (shrinking) body for each, so a K-arg application `(λλλ.body) x y z` walked the body K times. Mirror Rust whnf.rs:541-567: peel the whole telescope (peel_beta) and substitute all consumed args in ONE expr_inst_many walk (simul_subst semantics). Single-arg betas stay on the cheap expr_inst1 path, so short telescopes don't pay the list overhead (why the earlier unconditional simul_subst regressed). Hot/cold split expr_inst_many_walk's BVar arm (list_length / list_lookup / expr_lift) into expr_inst_many_bvar so the hot App/Lam walk stays narrow. Substitution was 53% of FFT on omega-heavy proofs. FFT (ix check <const>): _private.…Vector.extract_append._proof_1 56.64G -> 38.31G (-32.4%); Nat.add_comm 57.90M -> 56.17M; Init shard 10 4.721G -> 4.668G (-1.1%); shard 9 16.346G -> 16.314G. 297 ixvm tests pass.

whnf_const_head ran 5 try_* dispatchers (nat / str / bitvec / native / decidable) in sequence on every Const head, each comparing the head address against its family's primitive addresses — wasted on the common non-primitive head, and worse, several try_* speculatively WHNF their args (to test for literals / decidables), cascading into deep reductions that are then thrown away. Factor the chain into try_address_primitives and gate it behind prim_any_addr: a memoized (per head address) sum of the 43 head-dispatched primitive address_eq checks — 1 only for a real primitive, so a 0 skips the whole gauntlet straight to projection-definition / delta. Validated by a differential assert (prim_any_addr must be 1 whenever the gauntlet matched) run over the 297-test suite + Int.emod_emod_of_dvd + Vector.extract_append._proof_1; it caught two missing addresses (string_utf8_byte_size, punit_size_of_1), then passed clean across millions of reductions. Assert removed after validation. FFT (ix check <const>): _private.…Vector.extract_append._proof_1 38.31G -> 28.02G (-26.8%); Int.emod_emod_of_dvd 4.69G -> 3.97G (-15%); Nat.add_comm 56.17M -> 56.06M; Init shard 9 16.314G -> 16.279G; shard 10 4.668G -> 4.659G. 297 ixvm tests pass.

johnchandlerburnham · 2026-06-11T23:49:40Z

Are any of these improvements able to be ported to the Rust kernel implementation?

Also, can we run execution out-of-circuit on Init, Lean & Mathlib?

arthurpaulino added 5 commits June 9, 2026 11:49

arthurpaulino requested a review from gabriel-barrett June 10, 2026 11:31

arthurpaulino added 8 commits June 10, 2026 08:05

arthurpaulino enabled auto-merge (squash) June 12, 2026 01:09

johnchandlerburnham approved these changes Jun 12, 2026

View reviewed changes

arthurpaulino merged commit 096fe8e into main Jun 12, 2026
14 checks passed

arthurpaulino deleted the ap/optimize branch June 12, 2026 01:24

samuelburnham mentioned this pull request Jun 12, 2026

perf(aiur): Canonical levels, memoized dispatch, flat query record #441

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards#438

IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards#438
arthurpaulino merged 13 commits into
mainfrom
ap/optimize

arthurpaulino commented Jun 10, 2026 •

edited

Loading

Uh oh!

johnchandlerburnham commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arthurpaulino commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Method

Areas

Tooling

Soundness

Uh oh!

johnchandlerburnham commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arthurpaulino commented Jun 10, 2026 •

edited

Loading